13 research outputs found
VIENA2: A Driving Anticipation Dataset
Action anticipation is critical in scenarios where one needs to react before
the action is finalized. This is, for instance, the case in automated driving,
where a car needs to, e.g., avoid hitting pedestrians and respect traffic
lights. While solutions have been proposed to tackle subsets of the driving
anticipation tasks, by making use of diverse, task-specific sensors, there is
no single dataset or framework that addresses them all in a consistent manner.
In this paper, we therefore introduce a new, large-scale dataset, called
VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct
action classes. It contains more than 15K full HD, 5s long videos acquired in
various driving conditions, weathers, daytimes and environments, complemented
with a common and realistic set of sensor measurements. This amounts to more
than 2.25M frames, each annotated with an action label, corresponding to 600
samples per action class. We discuss our data acquisition strategy and the
statistics of our dataset, and benchmark state-of-the-art action anticipation
techniques, including a new multi-modal LSTM architecture with an effective
loss function for action anticipation in driving scenarios.Comment: Accepted in ACCV 201
Deep Learning and Statistical Models for Time-Critical Pedestrian Behaviour Prediction
The time it takes for a classifier to make an accurate prediction can be
crucial in many behaviour recognition problems. For example, an autonomous
vehicle should detect hazardous pedestrian behaviour early enough for it to
take appropriate measures. In this context, we compare the switching linear
dynamical system (SLDS) and a three-layered bi-directional long short-term
memory (LSTM) neural network, which are applied to infer pedestrian behaviour
from motion tracks. We show that, though the neural network model achieves an
accuracy of 80%, it requires long sequences to achieve this (100 samples or
more). The SLDS, has a lower accuracy of 74%, but it achieves this result with
short sequences (10 samples). To our knowledge, such a comparison on sequence
length has not been considered in the literature before. The results provide a
key intuition of the suitability of the models in time-critical problems
Survey on Vision-based Path Prediction
Path prediction is a fundamental task for estimating how pedestrians or
vehicles are going to move in a scene. Because path prediction as a task of
computer vision uses video as input, various information used for prediction,
such as the environment surrounding the target and the internal state of the
target, need to be estimated from the video in addition to predicting paths.
Many prediction approaches that include understanding the environment and the
internal state have been proposed. In this survey, we systematically summarize
methods of path prediction that take video as input and and extract features
from the video. Moreover, we introduce datasets used to evaluate path
prediction methods quantitatively.Comment: DAPI 201
Surgical Video Motion Magnification with Suppression of Instrument Artefacts
Video motion magnification could directly highlight subsurface blood vessels
in endoscopic video in order to prevent inadvertent damage and bleeding.
Applying motion filters to the full surgical image is however sensitive to
residual motion from the surgical instruments and can impede practical
application due to aberration motion artefacts. By storing the temporal filter
response from local spatial frequency information for a single cardiovascular
cycle prior to tool introduction to the scene, a filter can be used to
determine if motion magnification should be active for a spatial region of the
surgical image. In this paper, we propose a strategy to reduce aberration due
to non-physiological motion for surgical video motion magnification. We present
promising results on endoscopic transnasal transsphenoidal pituitary surgery
with a quantitative comparison to recent methods using Structural Similarity
(SSIM), as well as qualitative analysis by comparing spatio-temporal cross
sections of the videos and individual frames.Comment: Early accept to the Internation Conference on Medical Imaging
Computing and Computer Assisted Intervention (MICCAI) 2020 Presentation
available here: https://www.youtube.com/watch?v=kKI_Ygny76Q Supplementary
video available here: https://www.youtube.com/watch?v=8DUkcHI149
Using Phase Instead of Optical Flow for Action Recognition
Currently, the most common motion representation for action recognition is optical flow. Optical flow is based on particle tracking which adheres to a Lagrangian perspective on dynamics. In contrast to the Lagrangian perspective, the Eulerian model of dynamics does not track, but describes local changes. For video, an Eulerian phase-based motion representation, using complex steerable filters, has been successfully employed recently for motion magnification and video frame interpolation. Inspired by these previous works, here, we proposes learning Eulerian motion representations in a deep architecture for action recognition. We learn filters in the complex domain in an end-to-end manner. We design these complex filters to resemble complex Gabor filters, typically employed for phase-information extraction. We propose a phase-information extraction module, based on these complex filters, that can be used in any network architecture for extracting Eulerian representations. We experimentally analyze the added value of Eulerian motion representations, as extracted by our proposed phase extraction module, and compare with existing motion representations based on optical flow, on the UCF101 dataset.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Pattern Recognition and Bioinformatic
VIENA(2): A Driving Anticipation Dataset
Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, task-specific sensors, there is no single dataset or framework that addresses them all in a consistent manner. In this paper, we therefore introduce a new, large-scale dataset, called VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5 s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class. We discuss our data acquisition strategy and the statistics of our dataset, and benchmark state-of-the-art action anticipation techniques, including a new multi-modal LSTM architecture with an effective loss function for action anticipation in driving scenarios
Interest region based motion magnification
by Manisha Verma and Shanmuganathan Rama
An RNN-Based IMM Filter Surrogate
The problem of varying dynamics of tracked objects, such as pedestrians, is traditionally tackled with approaches like the Interacting Multiple Model (IMM) filter using a Bayesian formulation. By following the current trend towards using deep neural networks, in this paper an RNN-based IMM filter surrogate is presented. Similar to an IMM filter solution, the presented RNN-based model assigns a probability value to a performed dynamic and, based on them, puts out a multi-modal distribution over future pedestrian trajectories. The evaluation is done on synthetic data, reflecting prototypical pedestrian maneuvers
RED: A simple but effective Baseline Predictor for the TrajNet Benchmark
In recent years, there is a shift from modeling the tracking problem based on Bayesian formulation towards using deep neural networks. Towards this end, in this paper the effectiveness of various deep neural networks for predicting future pedestrian paths are evaluated. The analyzed deep networks solely rely, like in the traditional approaches, on observed tracklets without human-human interaction information. The evaluation is done on the publicly available TrajNet benchmark dataset [39], which builds up a repository of considerable and popular datasets for trajectory prediction. We show how a Recurrent-Encoder with a Dense layer stacked on top, referred to as RED-predictor, is able to achieve top-rank at the TrajNet 2018 challenge compared to elaborated models. Further, we investigate failure cases and give explanations for observed phenomena, and give some recommendations for overcoming demonstrated shortcomings